Tracing Sub-Structure in the European American Population with PCA-Informative Markers
نویسندگان
چکیده
Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent European American datasets (1,521 individuals-307,315 autosomal SNPs). Individual variation lies across a continuum with some individuals showing high degrees of admixture with non-European populations, as demonstrated through joint analysis with HapMap data. The CEPH Europeans only represent a small fraction of the variation encountered in the larger European American datasets we studied. We interpret the first eigenvector of this data as correlated with ancestry, and we apply an algorithm that we have previously described to select PCA-informative markers (PCAIMs) that can reproduce this structure. Importantly, we develop a novel method that can remove redundancy from the selected SNP panels and show that we can effectively remove correlated markers, thus increasing genotyping savings. Only 150-200 PCAIMs suffice to accurately predict fine structure in European American datasets, as identified by PCA. Simulating association studies, we couple our method with a PCA-based stratification correction tool and demonstrate that a small number of PCAIMs can efficiently remove false correlations with almost no loss in power. The structure informative SNPs that we propose are an important resource for genetic association studies of European Americans. Furthermore, our redundancy removal algorithm can be applied on sets of ancestry informative markers selected with any method in order to select the most uncorrelated SNPs, and significantly decreases genotyping costs.
منابع مشابه
Ancestral Informative Marker Selection and Population Structure Visualization Using Sparse Laplacian Eigenfunctions
Identification of a small panel of population structure informative markers can reduce genotyping cost and is useful in various applications, such as ancestry inference in association mapping, forensics and evolutionary theory in population genetics. Traditional methods to ascertain ancestral informative markers usually require the prior knowledge of individual ancestry and have difficulty for ...
متن کاملAnalysis of genetic diversity, phylogenetic relationships and population structure of Arasbaran cornelian cherry (Cornus mas L.) genotypes using ISSR molecular markers
Cornelian cherry (Cornus mas L.), considered as the ancestor of cultivated trees in Arasbaran region, is a medicinally and economically plant species. However, little is known about genetic diversity, breeding programs, and population structure of this species in mentioned region. Keeping this in view, the main objectives of present study were to analysis the genetic diversity, phyloge...
متن کاملEvaluation of ten SNP Markers for Human Identification and Paternity Analysis in Persian Population
Background: DNA markers are inevitable tools of human identification in forensic science. Single Nucleotide Polymorphisms (SNPs) are one category of these markers which is concerned to use especially in the case of degraded DNA because of their short amplicons. Objectives: Detection of highly informative SNPs by the criteria is the essential step to devel...
متن کاملAssociation of rs12913832 in the HERC2 Gene Affecting Human Iris Color Variation
Introduction: Human eye colour as a physical trait is based on the developmental biology and genetic determinants of the structure known as the iris, which is part of the uveal tract of the eye. Prediction of human visible characteristics (EVCs) by genotyping informative SNPs in DNA as biological witness opens up a new avenue in the forensic genetic. Variation of iris color rely on the amounts...
متن کاملEuropean American Stratification in Ovarian Cancer Case Control Data: The Utility of Genome-Wide Data for Inferring Ancestry
We investigated the ability of several principal components analysis (PCA)-based strategies to detect and control for population stratification using data from a multi-center study of epithelial ovarian cancer among women of European-American ethnicity. These include a correction based on an ancestry informative markers (AIMs) panel designed to capture European ancestral variation and correctio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PLoS Genetics
دوره 4 شماره
صفحات -
تاریخ انتشار 2008